Generic multiset programming with discrimination-based joins and symbolic Cartesian products
نویسندگان
چکیده
This paper presents GMP, a library for generic, SQL-style programming with multisets. It generalizes the querying core of SQL in a number of ways: Multisets may contain elements of arbitrary first-order data types, including references (pointers), recursive data types and nested multisets; it contains an expressive embedded domain-specific language for specifying user-definable equivalence and ordering relations, extending the built-in equality and inequality predicates; it admits mapping arbitrary functions over multisets, not just projections; it supports user-defined predicates in selections; and it allows user-defined aggregation functions. Most significantly, it avoids many cases of asymptotically inefficient nested iteration through Cartesian products that occur in a straightforward stream-based implementation of multisets. It accomplishes this by employing two novel techniques: symbolic (term) representations of multisets, specifically for Cartesian products, for facilitating dynamic symbolic computation, which intersperses algebraic simplification steps with conventional data processing; and discrimination-based joins, a generic technique for computing equijoins based on equivalence discriminators, as an alternative to hash-based and sort-merge joins. Full source code for GMP in Haskell, which is based on generic top-down discrimination (not included), is included for experimentation. We provide illustrative examples whose performance indicates that GMP, even without requisite algorithm and data structure engineering, is a realistic alternative to SQL even for SQL-expressible queries.
منابع مشابه
Optimizing relational algebra operations using generic partitioning discriminators and lazy products∗
We show how to implement in-memory execution of the core relational algebra operations of projection, selection and cross-product efficiently, using discrimination-based joins and lazy products. We introduce the notion of (partitioning) discriminator, which partitions a list of values according to a specified equivalence relation on keys the values are associated with. We show how discriminator...
متن کاملOptimizing Inequality Joins in Datalog with Approximated Constraint Propagation
Datalog systems evaluate joins over arithmetic (in)equalities as a naive generate-and-test of Cartesian products. We exploit aggregates in a source-to-source transformation to reduce the size of Cartesian products and to improve performance. Our approach approximates the well-known propagation technique from Constraint Programming. Experimental evaluation shows good run time speed-ups on a rang...
متن کاملMultiset Discrimination − a Method for Implementing Programming Language Systems Without Hashing
It is generally assumed that hashing is essential to many algorithms related to efficient compilation; e.g., symbol table formation and maintenance, grammar manipulation, basic block optimization, and global optimization. This paper questions this assumption, and initiates development of an efficient alternative compiler methodology without hashing or sorting. Underlying this methodology are se...
متن کاملHigher-Order Containers
Containers are a semantic way to talk about strictly positive types. In previous work it was shown that containers are closed under various constructions including products, coproducts, initial algebras and terminal coalgebras. In the present paper we show that, surprisingly, the category of containers is cartesian closed, giving rise to a full cartesian closed subcategory of endofunctors. The ...
متن کاملAn Algebraic Approach to Stable Domains
Day [75] showed that the category of continuous lattices and maps which preserve directed joins and arbitrary meets is the category of algebras for a monad over Set, Sp or Pos, the free functor being the set of filters of open sets. Separately, Berry [78] constructed a cartesian closed category whose morphisms preserve directed joins and connected meets, whilst Diers [79] considered similar fun...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Higher-Order and Symbolic Computation
دوره 23 شماره
صفحات -
تاریخ انتشار 2010